Stylometric Authorship Attribution of Collaborative Documents

نویسندگان

  • Edwin Dauber
  • Rebekah Overdorf
  • Rachel Greenstadt
چکیده

Stylometry is the study of writing style based on linguistic features and is typically applied to authorship attribution problems. In this work, we apply stylometry to a novel dataset of multi-authored documents collected from Wikia using both relaxed classification with a support vector machine (SVM) and multi-label classification techniques. We define five possible scenarios and show that one, the case where labeled and unlabeled collaborative documents by the same authors are available, yields high accuracy on our dataset while the other, more restrictive cases yield lower accuracies. Based on the results of these experiments and knowledge of the multi-label classifiers used, we propose a hypothesis to explain this overall poor performance. Additionally, we perform authorship attribution of pre-segmented text from the Wikia dataset, and show that while this performs better than multi-label learning it requires large amounts of data to be successful.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Sentence-Level Authorship Attribution

We examine the problem of authorship attribution in collaborative documents. We seek to develop new deep learning models tailored to this task. We have curated a novel dataset by parsing Wikipedia’s edit history, which we use to demonstrate the feasiblity of deep models to multi-author attribution at the sentence-level. Though we attempt to formulate models which learn stylometric features base...

متن کامل

Style based Authorship Attribution on English Editorial Documents

The aim of the authorship attribution is identification of the author/s of unknown document(s). Every author has a unique style of writing pattern. The present paper identifies the unique style of an author(s) using lexical stylometric features. The lexical feature vectors of various authors are used in the supervised machine learning algorithms for predicting the unknown document. The highest ...

متن کامل

Classify, but Verify: Breaking the Closed-World Assumption in Stylometric Authorship Attribution

Forensic stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the document’s author is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional methods used in classification are i...

متن کامل

Investigating Topic Influence in Authorship Attribution

The aim of this paper is to explore text topic influence in authorship attribution. Specifically, we test the widely accepted belief that stylometric variables commonly used in authorship attribution are topic-neutral and can be used in multi-topic corpora. In order to investigate this hypothesis, we created a special corpus, which was controlled for topic and author simultaneously. The corpus ...

متن کامل

Stylometry and collaborative authorship: Eddy, Lovecraft, and 'The Loved Dead'

The authorship of the 1924 short story ‘The Loved Dead’ has been contested by family members of Clifford Martin Eddy, Jr. and Sunand Tryambak Joshi, a leading scholar on Howard Phillips Lovecraft. The authors of this article use stylometric methods to provide evidence for a claim about the authorship of the story and to analyze the nature of Eddy’s collaboration with Lovecraft. Further, we exte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017